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METHOD FOR THE EXPONENTIATION OR SCALAR MULTIPLICATION OF ELEMENTS 

The present invention relates to a method for the multi-exponentiation Ef^i gi '" 
or the multi-scalar multiplication Sh^ eigt of elements gi by means of in each case at least 
one exponent or scalar e h in particular an integer exponent or scalar, which has in each case a 
maximum bit rate n or bit length, in particular for the exponentiation g e or scalar 
5 multiplication e g of an element g by means of at least one exponent or scalar e, in particular 
an integer exponent or scalar, which has in each case a maximum bit rate n or bit length, 
which elements g t ; g derive from at least one group G 5 for example an Abelian group G, 
which 

in the case of (multi-)exponentiation is notated in particular multiplicatively 

10 and 

in the case of (multi-)scalar multiplication is notated in particular additively. 
In asymmetric encryption methods or public key cryptosystems which are 
based on the insolvability of the discrete logarithm problem in Abelian groups, the 
exponentiation g n of a group element g or the multi-exponentiation gi ni 'hk k of a number of 
1 5 group elements g, h is one of the fundamental operations in signature and key exchange 
methods. Acceleration of this fundamental operation is therefore of particular importance. 

The possibility of precomputing powers of the group element g presents the 
problem that in this case the group element g which is used must be known beforehand. This 
is not the case for example in the case of signature verification in the 
20 D[igital]S[ignature]A[lgorithm] or in the Elliptic] C[urve] D[igital] Signature] A [lgorithm] or 
in the Diffie-Hellman key exchange method. Added to this is the fact that, on smart cards for 
example, there is not enough storage space to store a sufficiently large number of 

precomputed elements. 

Another possibility lies in recoding the exponent used; this possibility is 
25 independent of the choice of group element g and is therefore particularly attractive for 
accelerating the abovementioned signature and key exchange methods. 

The techniques for recoding the exponent used in algorithms for 
(multi-)exponentiation are based on the fundamental idea that an integer is rewritten in a , 
different form than the usual binary representation, namely with a lower density and with 
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coefficients in a finite set of integers C which contains at least the elements 0 and 1 . 

If, in the specific group in which the computation is carried out, the inversion 
of an element is "gratis", that is to say if the computational complexity for the inversion is 
very low compared to the other group operations, and if use is made of signed coefficients, 
5 then it can always be assumed that c s C also implies -c b C. If the inversion is complicated in 
computational terms, all the elements of the set C are non-negative integers. 

A so-called "square-and-multiply" exponentiation algorithm for the 
computation of g e , wherein g is a group element and e is an integer, then operates in a known 
manner as follows: 
10 - e is written as 2W 1 e{l\ wherein each coefficient e t lies in C; 

the elements g e " are either given or are computed beforehand; 
the temporary variable x is set to g e "; 

for all i = n-\ , n-2, 0, x is first squared and then, if the coefficient e t is non- 
vanishing, multiplied by the element g 6i ; 
1 5 - following the last squaring operation carried out for i = 0 and where appropriate 

(namely if coefficient e 0 is non-vanishing) following the multiplication by the element 
g e \ the value of the temporary variable x is the desired result g e . 

The number of group operations is then approximately equal to the number of 
non-vanishing coefficients e t in the representation £/=o" e{l l of the exponent e (these group 
20 operations are multiplications either by precomputed or given group elements or, if the 
inversion of group elements is fast, by the inverses thereof) plus 

the length n of the representation (the corresponding, for example n, operations are in 
this case squaring operations) and 

the cardinality of the table of elements g\ wherein c s C and c is not equal to zero, or 
25 - half this cardinality if the inversion in the given group is fast and the coefficients 

are signed. 

A good match between the size of C and the density of the representation is 
the path to optimal performance in the representation of the exponent. 

Examples of exponent recoding include: 
30 - the N[on] A[djacent]F[orm] (cf. G. W. Reitwiesner, "Binary arithmetic", Advances in 

Computers 1, pages 231 to 308, 1960; S. Arno and F. S. Wheeler, "Signed digit 
representations of minimal Hamming weight", IEEE Transactions on Computers 42, 
1993, pages 1007 to 1010); 

the same-weight method similar to the N[on] A[djacent]F[orm] (cf. M. Joye and S.-M. 
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Yen, "Optimal left-to-right binary signed-digit recoding", IEEE Transactions on 
Computers 49 (7), 2000, pages 740 to 748); 

recoding for exponentiation with fixed windows (cf. J. Bos and M. Coster, "Addition 
chain heuristics 55 , in Advances in Cryptology - CRYPTO '89, LNCS 435, 1990, pages 
5 400 to 407; A. Menezes, P. van Oorschot and S. Vanstone, "Handbook of Applied 

Cryptography 55 , CRC Press, 1996); 

the G[eneralized]N[on]A[djacent]F[orm] (cf. W. E. Clark and J. J. Liang, "On 
arithmetic weight for a general radix representation of integers", IEEE Transactions 
on Information Theory IT-19, 1973, pages 823 to 826); 
10 - "sliding windows" (cf. E. G. Thurber, "On addition chains l(mn) < l(n)b and lower 

bounds for c(r)", Duke Mathematical Journal 40, 1973, pages 907 to 913; 

A. Menezes, P. van Oorschot and S. Vanstone, "Handbook of Applied 
Cryptography", CRC Press, 1996), optionally on the N[on]A[djacent]F[orm] or on 
other redundant base-2 representations (cf. R. Avanzi, "On the complexity of certain 

1 5 multi-exponentiation techniques in cryptography 55 , published in Journal of 

Cryptology; K. Koyama and Y. Tsuruoka, "Speeding up elliptic cryptosystems by 
using a signed binary window method 55 , in E. Brickell (Ed.), "Advances in 
Cryptology, Proceedings of Crypto '92", Lecture Notes in Computer Science Volume 
740, pages 345 to 357, Springer-Verlag, 1992; cf. also K. Koyama, Y. Tsuruoka, "A 

20 Signed Binary Window Method for Fast Computing over Elliptic Curves", IEICE 

Trans. Fundamentals, Volume E76-A, No. 1, pages 55 to 62, January 1993); and the 
w[indow]N[on]A[djacent]F[orm] (cf. J. A. Solinas, "An improved algorithm for 
arithmetic on a family of elliptic curves", in Advances in Cryptology - CRYPTO 6 97, 

B. S. Kaliski jr. (Ed.), Lecture Notes in Computer Science Volume 1294, pages 357 to 
25 371; H. Cohen, "Analysis of the flexible window powering algorithm", Advance copy 

available at http://www.math.u-bordeaux.fr/^cohen/ ). 

With regard to exponent recoding, however, it should be considered that this 
recoding may in many cases not take place "online", that is to say during the exponentiation 
itself; for this reason, the recoded exponents must first be stored. However, this storage 
30 requirement is disadvantageous in particular in extremely restricted environments, such as in 
smart cards for example, since in such an extremely restricted environment each byte of the 

memory is "precious 55 . 

Based on the abovementioned disadvantages and shortcomings, and with 
reference to the outlined prior art, it is an object of the present invention to further develop a 
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method of the type mentioned above in such a manner that the requirement in terms of 
storage space for recoded exponents or scalars is reduced as much as possible even and 
especially in extremely restricted environments, such as in smart cards for example. 

This object is achieved by a method having the features specified in claim 1. 
5 Advantageous embodiments and expedient developments of the present invention are 
characterized in the dependent claims. 

The present invention is thereby based on the principle of almost-online 
recoding for single exponentiation or single scalar multiplication or for multi-exponentiation 
or multi-scalar multiplication in restricted environments; in this connection, "almost-online" 

1 0 recoding means that the exponent or scalar is split into sections which are individually 
recoded and the recoding of which takes place in layers between parts of the 
(multi-)exponentiation or the (multi-)scalar multiplication. 

The technique of "almost-online" recoding may be used to reduce the storage 
requirement for the recoded exponents or for the recoded scalars. The effects of almost- 

1 5 online recoding on the total running time of the (multi-)exponentiation or the (multi-)scalar 
multiplication are usually minimal. 

Based on the abovementioned exemplary recoding operations, in the method 
according to the present invention it is assumed that the recoding in the case of 
multi-exponentiation or multi-scalar multiplication is of the form e t = 2y=o n btJ2!\ in the case of 

20 (single) exponentiation or (single) scalar multiplication, which is a special case of 

multi-exponentiation or multi-scalar multiplication, the assumed basis is accordingly taken as 
e = E/=o" bj2? 9 wherein n = |log 2 e| is the bit length of e, that is to say this bit length n is at most 
one bit longer than the binary representation. In other words, this means that w+1 is to be 
understood as the maximum length of any exponent or scalar e t = E/= 0 72 btJ2?. 

25 It is furthermore assumed that the recoded algorithm depends - possibly not 

explicitly - on a parameter w which usually corresponds to the width of a window over which 
the bits of the exponents or scalars e f are read, or to the upper limit of such a width. 

On this basis, according to the teaching of the present invention, the 
multi-exponentiation which can be expressed by symbols in the notation H^gf, in the case 

30 of a multiplicatively notated group, in particular an Abelian group, G, takes place in the 
following steps: 

firstly: selecting a chunk or part width L which may be significantly greater than the 

parameter w and significantly shorter than the maximum length of any 
exponent e t \ 
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then: 

[a. 1 ] computing and storing or 

[a.2] retrieving from a memory 

all powers gf, 

5 wherein gy is an element of the group G and 

c is a permissible positive coefficient; 
[b] dividing each exponent e h in particular an integer exponent, into a number of 

chunks or into a number of parts having the chunk or part width L selected 
above, 

10 [b. 1 ] wherein the exponent e t can be written in the divided form e% = S^= 0 r ^i t k2 kL 

where 0 < e i>k < 2 L , and 
[b.2] wherein the number r of chunks or parts e it k can be defined in particular as an 

integer quotient of the maximum bit rate n and the bit rate L of the chunk or 
part width; 

15 [c] individually recoding the chunks or parts e iik , wherein this recoding can be 

divided into the following substeps for each individual chunk or for each 
individual part e it k of each exponent e{, 
[c.l] setting a temporary variable x to a standardized value, in particular to the value 

1 , wherein 1 denotes the neutral element of the group G with respect to the 
20 group operation assigned to the group G; 

[c.2] setting a variable kto the values r-1, r-2, 0 (one after the other), wherein for 

each such value k= r-1, r-2, 0 of the variable k the following substeps are 
carried out: 

[c.2.i] for each value i = 1, 2, d of an index i 9 wherein d is defined as the number 

25 of elements gy and of exponents e t assigned to the elements g t \ 

T 

[c.2.i.a] recoding the chunk or part e itk as the sum Sy=o bifi! of powers of two 2 / 

weighted by in each case a coefficient by deriving from a finite set C of 
integers; 

[c.2.i.b] if the coefficient b itL assigned to the highest power of two 2 L does not vanish: 

30 setting the temporary variable x to the product of x and the power gy a of the 

element g t which is assigned to the coefficient b i)L of the highest power of two 

[c.2.ii] for each value j = LA 9 L-2, 0 of the indexy: 

[c.2.ii.a] squaring the temporary variable x; 
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[c.2.ii.b] for each value i ~ 1, 2, d of the index i: 

if the coefficient 6 y assigned to the power of two 2/ does not vanish: 
setting the temporary variable x to the product of x and the power g f iJ 
of the element g t which is assigned to the coefficient by of the power of 
5 two 2 / ; 

finally: returning x. 

The special case of (single) exponentiation is obtained above for d = 1 , that is 
to say when there is a single element g and a single exponent e assigned to the element g, 
which can de facto be equated with omitting the index z; in this case, an element g is 
10 therefore exponentiated by an exponent e, in particular an integer exponent, having a 
maximum bit rate n or bit length, to form a power g e , wherein the element g once again 
derives from a multiplicatively notated Abelian group G. 

In an analogous manner, according to the teaching of the present invention, 
the multi-scalar multiplication which can be expressed by symbols in the notation ejg^m 
1 5 the case of an additively notated group, in particular an Abelian group, G, takes place in the 
following steps: 

firstly: selecting a chunk or part width L which may be significantly greater than the 

parameter w and significantly shorter than the maximum length of any scalar 

20 then: 

[a. 1] computing and storing or 

[a. 2] retrieving from a memory 

all multiples c'gu 

wherein c is a permissible positive coefficient and 
25 gt is an element of the group G; 

[b] dividing each scalar e h in particular an integer scalar, into a number of chunks 

or into a number of parts e it u having the chunk or part width L selected above, 

[b.l] wherein the scalar e t can be written in the divided form e t = IW e iik 2 kL where 

0 < e uk < 2 L , and 

30 [b.2] wherein the number r of chunks or parts e it k can be defined in particular as an 

integer quotient of the maximum bit rate n and the bit rate L of the chunk or 
part width; 

[c] individually recoding the chunks or parts e itk , wherein this recoding can be 

divided into the following substeps for each individual chunk or for each 
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individual part e it k of each scalar e t \ 
[c. 1] setting a temporary variable x to a standardized value, in particular to the value 

0, wherein 0 denotes the neutral element of the group G with respect to the 
group operation assigned to the group G; 
5 [c.2] setting a variable k to the values r-1, r-2, ...,0 (one after the other), wherein for 

each such value k= r-1, r-2, 0 of the variable k the following substeps are 
carried out: 

[c.2.i] for each value i = 1, 2, ^ of an index i 9 wherein d is defined as the number 

of elements g t and of scalars e t assigned to the elements g f : 

T * 

10 [c.2.i.a] recoding the chunk or part as the sum S/=o b^Q/ of powers of two 2 / 

weighted by in each case a coefficient 6y deriving from a finite set C of 
integers; 

[c.2.i.b] if the coefficient b^i assigned to the highest power of two 2 L does not vanish: 

setting the temporary variable x to the sum of x and the multiple btXgi of the 
1 5 element g ( which is assigned to the coefficient b^L of the highest power of two 

[c.2.ii] for each value j = Z-l, Z-2, 0 of the index /: 

[c.2.ii.a] doubling the temporary variable x; 
[c.2.ii.b] for each value i = 1 ? 2, dof the index /: 
20 if the coefficient by assigned to the power of two 2 / does not vanish: 

setting the temporary variable x to the sum of x and the multiple bijgt 
of the element g t which is assigned to the coefficient b$j of the power of 
two 2 / ; 
finally : returning x. 

25 The special case of (single) scalar multiplication is obtained above for d = 1 , 

that is to say when there is a single element g and a single scalar e assigned to the element g 5 
which can de facto be equated with omitting the index i; in this case, an element g is 
therefore multiplied by a scalar e, in particular an integer scalar, having a maximum bit rate n 
or bit length, to give a product e g, wherein the element g once again derives from an 
30 additively notated Abelian group G. 

According to one preferred further embodiment of the present invention, 
the recoded chunk or the recoded part e it k is used once and 
the memory unit in which the recoded chunk or the recoded part e it k is stored is 
used to recode the following chunk or the following part e it k-u 
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so that the storage requirement of (multi-)exponentiation algorithms or (multi-)scalar 
multiplication algorithms based on right-to-left recoding of integers can be considerably 
reduced. 

The present invention furthemiore relates to a microprocessor which operates 
in accordance with a method of the type described above. 

The present invention furthermore relates to a device, in particular a chip card 
and/or in particular a smart card, having at least one microprocessor of the type described 
above. 

The present invention finally relates to the use 
of a method of the type described above and/or 
of at least one microprocessor of the type described above and/or 
of at least one device, in particular of at least one chip card and/or in particular 
of at least one smart card, of the type described above 

in at least one cryptosystem, in particular in at least one public key cryptosystem, in at least 
one key exchange system or in at least one signature system. 

As already mentioned above, there are various possibilities for advantageously 
implementing and developing the teaching of the present invention. In this respect, on the one 
hand reference is made to the claims dependent on claim 1 and on the other hand further 
embodiments, features and advantages of the present invention will be described in more 
detail below on the basis of the exemplary implementation of five examples of embodiments, 
wherein 

the first example of embodiment relates to the method of single 
exponentiation, 

the second example of embodiment relates to the method of 
multi-exponentiation and 

the third example of embodiment likewise relates to the method of 
multi-exponentiation, 

that is to say based on a multiplicative notation for the Abelian group G, and wherein 

the fourth example of embodiment relates to the method of single scalar 

multiplication and 

the fifth example of embodiment relates to the method of multi-scalar 

multiplication, 

that is to say based on an additive notation for the Abelian group G (in the case of such an 
additive notation for the Abelian group G, compared to the multiplicative notation for the 
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Abelian group G in the above section "Prior art", changes and replacements will of course 
have to be made, and these are obvious from the different wordings between claim 4 [<— > 
(multi-)exponentiation: neutral element "1"; "squaring"; "product"] and claim 5 [<— > 
(multi-)scalar multiplication: neutral element "0"; "doubling"; "sum"]. 
5 The five examples of embodiments shown below in respect of the present 

invention are used for a general technique in the form of so-called almost-online recoding, 
which can be used to considerably reduce the storage requirement of 

single exponentiation algorithms (cf. first example of embodiment), 
multi-exponentiation algorithms (cf second example of embodiment and third 
10 example of embodiment), 

single scalar multiplication algorithms (cf. fourth example of embodiment) or 
multi-scalar multiplication algorithms (cf. fifth example of embodiment) 
which are based on right-to-left recoding of integers. 

The technique of almost-online recoding may be very useful in extremely 
1 5 restricted environments, such as in chip cards or in smart cards for example, wherein the 

saving in terms of storage space may depend considerably on the specific situation (possibly, 
a throughput loss which is nevertheless very low may occur, particularly when the exponent 
or scalar is divided into too many small parts (= into too many small "chunks"); the effect on 
performance may then be noticeable). 

20 

First example of embodiment: single exponentiation 

If G is an Abelian group with an order of 2 f \ and it is assumed that an element 
gsG and an integer e are given, the aim according to the invention is to compute x = g e as 

25 quickly as possible. The recoding according to the invention makes the exponentiation very 
quick, but this recoding cannot be used online, that is to say cannot take place during the 
exponentiation itself; this is the case for example in the w[indow]N[on]A[djacent]F[orm]. 

The technique used in almost-online recoding consists in dividing the 
exponents e into a number of "exponent chunks", that is to say into a number of exponent 

3 0 sections or into a number of exponent parts which are considerably longer than w bits but also 
much shorter than e. The chunks or parts are then receded individually, used once, and then 
the memory in which the chunks or parts were stored is reused to recode the next chunk or the 
next part, so that the total storage space required for the exponents n can be significantly 
reduced. 
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The almost-online recoding shown below takes place under the assumption 
that the chunks or parts have a length of L bits. The reason that L is much greater than w is 
that the estimates for the number of non- vanishing coefficients in receded exponents are 
usually given asymptotically, but the actual number of non-vanishing coefficients in recoded 
exponents is sometimes greater on account of a small additive constant, and this is shown 
below on the basis of a specific example. 

Hereinbelow, within the context of the first example of embodiment of almost- 
online recoding, an algorithm is presented in which the following are entered: 

hi 

a basic element g of the Abelian group G, 
an integer e having n bits, 
a window width w and 
a chunk or part width L » w\ 
the single exponentiation g e is output: 



Step 1 . x <— 1 

Step 2. r <- \nlL\ then e = J^e* 2 kL for 0 < e k < 2 
Step 3. for k~r — \ downto 0 do { 

(a) recode (e k )->e k = b J 2 ' 

(b) if b L ^ 0 then x <<— x • g h 

(c) for j — L-l downto 0 do { 

(i) x <— x 2 

(ii) x<-x-g bj }} 
Step 4. return x 



It should be noted here that it may happen after L bits that the above algorithm 
carries out two group multiplications in a row instead of only one group multiplication. This 
happens if one of the chunks e t (= one of the exponent parts ej) represents an uneven number 
and if the recoding of the following chunk (= of the following exponent part e i+ j) is one 
coefficient longer not equal to zero). 

Using a specific example in which the selected recoding is the 
w[indow]N[on]A[djacent]F[orm], it can now be shown that the loss in terms of speed is 
minimal and that the saving in terms of storage space may be quite great: 
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For n— 160, the optimal value of w is equal to 5 (cf. H. Cohen, "Analysis of 
the flexible window powering algorithm 55 , advance copy obtainable at 

n <- <7 q ti -in 

http ://www. math.u-bordeaux.fr/-cohen/ ) ; seven powers g , g , g , g , g , g , g of the basic 
element g thus have to be precomputed, and g is also temporarily required. At least five bits 
5 per recoded coefficient are required, but the implementor uses presumably complete signed 
bytes. 

Two recoded exponents require 320 bytes of R[andom]A[ccess]M[emory], but 
two recoded 32-bit chunks (= 32-bit sections or 32-bit parts) require only 66 bytes of 
R[andom]A[ccess]M[emory]. The 254 bytes of R[andom]A[ccess]M[emory] which are saved 
10 may be used to store six points of an elliptic curve in affine coordinates. 

Cohen has now proven (cf. H. Cohen, "Analysis of the flexible window 
powering algorithm 55 , advance copy obtainable at http://www.math.u-bordeaux.fr/-cohen/) 
that the average Hamming weight of the w[indow]N[on]A[djacent]F[orm] of an integer 
having n bits (which is the average number of multiplications in the corresponding 
1 5 exponentiation plus one) is equal to 

+ 1 - Q.5(w-\)(w+2)l(w+lf + 0(p% 
wherein p = p(w) is a real number greater than one which is dependent only on w and not on 
n. In numerical terms, p=2 = 1.414... for w = 3, 

p= 1.2157... for w = 4 and 
20 p= 1.1296... for w = 5. 

The above set with regard to the average Hamming weight of the 
w[indow]N[on]A[djacent]F[orm] implies that, when an integer is split into r chunks or into r 
parts, the total Hamming weight of the r chunks or r parts is 

(r- 1 )( 1 -0 . 5 (w- 1 )(w+2)/(w+ 1 ) 2 ) 
25 times greater than the Hamming weight of the original integer. 

In the case where n = 160, there may be selected L = 32 and consequently r = 
5. The "flexible window 55 method requires on average 22/9 = 2.44 fewer group operations 
than the almost-online method according to the present invention. This difference is 
approximately 1.26 percent of the overall running time of the exponentiation algorithm (over 
30 the 193 group operations, including the time for the precomputations); however, the storage 
requirement for the recoded exponents has been reduced by approximately eighty percent. 
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Second example of embodiment: multi-exponentiation 
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The above algorithm from the first example of embodiment (single 
exponentiation) can be transformed into a multi-exponentiation method. 

If group elements gj, ...,gdS G and exponents ej y ed where d > 1 are given 
and Tli=i d gf 1 ' is to be computed, firstly a decision is made to use a sparse receding of the 
exponents ej 9 use is then made of a "square-and-multiply" loop: 

Firstly, all the powers gf are computed and stored, wherein c is a permissible 
positive coefficient. A temporary variable x is then set to 1 s G. For j = n, n-l, 0, x is first 
squared, and for i = 1, d the squared x is multiplied by gfv, wherein is the coefficient of 
2/ in the recoding of e f . At the end, the temporary variable x contains the desired result. 

This method is also referred to as fast exponentiation; as in the situation 
according to the first example of embodiment, it is once again desirable to retain the 
advantages of a good right-to-left recoding without having to use too much memory. 

The following variant carries out recoding "almost-online", that is to say 
almost during the fast multi-exponentiation or shortly after the fast multi-exponentiation, 
wherein the following are entered in the algorithm 



basic elements gu — 9 gd of the Abelian group G, 
integers ej, ea (d> 1) each having at most n bits, 
a window width w, 
a chunk or part width L » w and 

precomputed powers g/ c for all c in the set of coefficients; 



the multi-exponentiation n,=o gf* is output: 



Step 1. 



X<r-l 



Step 2. 



r*-\nl L\ then e f = e iJt 2 kL for / = 1 . . . d 



Step 3. 



for k = r — l downto 0 do { 



(a) forz = l to d do { 



recode (e lJe ) -> e gJt = YsU b u 2J 

if b i L ^ 0 then x <— x - g^ itL } 
(b) for j = L - 1 downto 0 do { 
(i) x <— x 2 
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(ii) for / = 1 to d do { if b lL * 0 then x <- x ■ g*'-< } 

} } 
Step 4. return x 

The comments made in respect of the algorithm according to the first example 
of embodiment are also relevant here, that is to say in the case of elliptic curves over a finite 
field where n = 160 and L = 32, 2AAd group operations are used, wherein d is the number of 
powers which are to be multiplied by one another. Although this is more than in the case of 
single fast exponentiation, 254^ bytes of R[andom]A[ccess]M[emory] can be saved, that is to 
say storage for 6 d precomputed points in affine coordinates. 

Third example of embodiment: Multi-exponentiation with parallel shifting windows 

In the third example of embodiment, the use of almost-online recoding is 
described in a generalization (cf. R. Avanzi, "On the complexity of certain 
multi-exponentiation techniques in cryptography", published in Journal of Cryptology) of an 
algorithm by Yen, Laih and Lenstra (cf. S.-M. Yen, C.-S. Laih and A. K. Lenstra, 
"Multi-exponentiation", IEE Proc. Comput. Digit. Tech., Volume 141, No. 6, November 
1994). 

In this connection, this third example of embodiment described below serves 
predominantly to explain the basic principles of the described algorithm; the increase in 
efficiency which can be achieved must be deemed to be rather small. The algorithm is 
essentially a variant of the trick by Shamir using a sliding window and is shown below: 

The following are entered in the algorithm: 

a window width w, 

integers e t = E/=o w and 

a set E of precomputed elements from the group G of the form n,=i g t '" 
including gi 9 ... 9 gd (the set E is highly dependent on the window width w and 
on the representation of the integers e,; cf. the comments made after the 
algorithm below); 
the multi-exponentiation TL^\ d gf ! is output: 

Step 1 . t <— n and x <— 1 e G 
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Step 2. if (e ltM = 0 for i = 1,2, . . . ,d) then { 

(a) f <— ^ — 1 and x <— x 
} else { 

(b) if t > w then t<-t-w else { w <- 1 and t <- 0} 



(d) if j is the greatest natural number s > 0 such that 2$|/, for all / 

(e) for* = 1,2,... ,J do f t <- /,/2' 

(f) (i) x <- x 2 "~* ; (ii) x «- x • IIm g/ 5 and (iii) x <- x 2 * } 
Step 3. if t = 0 then return x else goto step 2 



In this respect, it should be noted that /at the start of step 2.(c) is the integer 
represented by a chain of w successive bits of the exponent e. After the standardization step 
2.(e) ? at least one of the fixs uneven. 

If in the group G the inversion of elements takes place quickly, the 
N[on] A[djacent]F[orm] is selected as the recoding. It can easily be seen that the number of 
signed integers having w bits in the N[on]A[djacent]F[orm] is I w = (2 w+2 -(-l) u ')/3. The set E 
contains all the elements of the form Yl^i d gfi such that 

\kf\ < T w for i = 1 5 2, d, 

at least one of the k t is uneven and 

the first non-vanishing value from the sequence k u k 2 , k p is positive. 
In this way 5 step 2.(f)(ii) may be carried out either by a multiplication or by a division. The 

cardinality of E is (/ w rf -/ w -/)/2. 

The parameters w = 2 = d are then fixed and the N[on]A[djacent]F[orm] is 
selected for recoding the exponents. The reason for this is the production of digital signatures 
with elliptic curves (cf. American National Standards Institute, "ANSI X9.62: Public Key 
Cryptography for the Financial Services Industry: The Elliptic Curve Digital Signature 

Algorithm (ECDSA), 1999): 

In this case, d=2, and for the relevant size of the exponents, namely from n = 
160 to n = 240, the Parameter w = 2 is optimal (cf. R. Avanzi, "On the complexity of certain 
multi-exponentiation techniques in cryptography", published in Journal of Cryptology). The 
above algorithm from the third example of embodiment is thus used for almost-online 
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multi-exponentiation with d = 2 = w and the N[on]A[djacent]F[orm], wherein the following 

are entered in the algorithm 

two (basic) elements g u g 2 of the Abelian group G, 

two natural numbers ej 9 e 2 each having at most n bits and 

a chunk or part width L where n » L » 2; 

the double exponentiation gf'gf 2 is output: 

Step 1 . Precompute the 8 elements g"g b 2 , where either 0 < a < 2 and -2<b <2, 

wherein at least one of a, b is uneven, or a = 0 and & = 1 . [see note A.2] 
Step 2. x <- 1 

f « / Z] 5 then e t = ^lo e a 2 * L for ' = ^ with 0 " e * < ^ 



Step 3. for /c - r - 1 downto 0 do { 

(a) for i = 1,2 do recode e a as NAF : v. := e a = X>o ^ J 2 ' 

^ <— 0, a 2 <— 0 

(b) if (v 1L? v 2 L )^(0 ? 0)then{ 

(i) if OV-i > ) = (0,0) then x +- x - (g^ ■ ) 

(ii) else {a, <- v l>L , a 2 ^- v 2 L } } 

(c) for 7 = Z, - 1 downto 0 do { 

(i) x x 2 

(ii) if (v 1J? v 2J )^ (0,0) then { 

if (a p a 2 )* (0,0) then { 

(iii) a, <- 2aj + v u , a 2 <- 2a 2 + v 2 >y 

(iv) x <- x • (g* ' 2 ) (or : x x /(gp ' gl° 2 )) 
} else { 

if O* > 0 and (y hJ ^y 2 j-i) * (°>°)) then { 

(v) a x <-v lJ9 a 2 <-v 2J 
} else { 

(vi) x <r- x ■ ( g y ■ g?-< ) (or: x <- xl{g^ ■ g?' )) 
}}} 

} (End of inner for loop) 
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} (End of outer for loop) 
Step 4. return x 



It should be noted here that in step 3 the two interleaved loops of the above 
5 algorithm from the first example of embodiment and the simultaneous sequential 

interrogation of the above first algorithm from the third example of embodiment can be seen. 

In steps 3.(c)(ii), 3.(c)(iii), 3.(c)(iv), 3.(c)(v), 3.(c)(vi) ? windows of width 2 are 
formed via the coupled N[on]A[djacent]F[orm]s of two chunks or of two parts having L bits. 

Two "carry-overs" aj and store the values of a non- vanishing column if the 
1 0 following column is also non- vanishing, so that the values can be doubled during the next 
iteration and added to the values in the next column; cf. step 3.(c)(iii) . Steps 3.(c)(iv) and 
3.(c)(vi) are carried out by a multiplication or by a division. 

If two integers bj and t>2 are then written as bf = S^i™ bij2? 9 a column consists 
of a pair of coefficients (bi )h b2 ) t) from the above representations. The ordered sequence of 
15 such columns is the common representation of bj and Z?2- The number of non- vanishing 
columns in a common representation is referred to as the Hamming weight of the 
representation, and the density thereof is the quotient of the Hamming weight to the length m. 

The average Hamming weight of a joint representation of two 
N[on]A[djacent]F[orm]s is 5/9. It is possible to demonstrate that the number of 
20 multiplications to be expected in the main loop of the above second algorithm from the third 
example of embodiment is 1 ln/27 (cf. R. Avanzi, "On the complexity of certain 
multi-exponentiation techniques in cryptography", published in Journal of Cryptology), 
wherein the additional group operations which may be caused by the almost-online technique 
are ruled out. 

25 The assumption that L is either the native word length of the 

C[entral]P[rocessing]U[nit] of the smart card or a small multiple thereof, for example L = 32, 
also allows simpler implementation. 

Using exponents having 160 bits and taking account of the fact that a 
N[on]A[djacent]F[orm] can efficiently be stored with only two bits per coefficient, 

30 approximately sixteen bytes of R[andom]A[ccess]M[emory] are required to store the two 
recoded 32-bit chunks (= the two recoded 32-bit sections or the two recoded 32-bit parts) 
instead of the eighty bytes for the full exponents. The saving in terms of storage space 
corresponds to the storage requirement of one point in projective coordinates on an elliptic 
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curve over a finite field having 1 60 bits, and is thus not as considerable as in the two 
preceding examples of embodiments. 

Based on a computer program which counts the number of windows formed 
by the above second algorithm from the third example of embodiment on pairs of numbers of 
5 given length, the average of the results from one hundred thousand run-throughs of the 
program can then be computed: 

The average number of windows on pairs of numbers having 160 bits is 
65.81 153 (it should be noted that (1 1/27)160 = 65.185), the average number of windows on 
pairs of numbers having 32 bits is 13.64216 (it should be noted that (1 1/27)32 = 13.037). 
10 Consequently, it is to be expected, if n = 160 and L = 32, that the almost-online algorithm 
requires only 5 13. 64216 - 65.81 153 = 2.39927, that is to say about 2.4 more group 
operations than the above first algorithm from the third example of embodiment. 

Since 235 is the total number of group operations of the above first algorithm 
from the third example of embodiment which is to be expected in the case where n = 160, it 
1 5 may be estimated that the loss in terms of performance caused by the almost-online technique 
used according to the invention is approximately one percent. 

There is an alternative representation to the N[on] A[djacent]F[orm] with the 
same Hamming weight, which can be computed by a simple algorithm that operates from left 
to right (cf. M. Joye and S.-M. Yen, "Optimal left-to-right binary signed-digit recoding", 
20 IEEE Transactions on Computers 49 (7), 2000, pages 740 to 748). The question may be 
raised as to whether this representation could not be used instead of the almost-online 
recoding. The reason for the negative response is that this alternative does not have the 
N[on]A[djacent]F[orm] property, that is to say two successive coefficients should not both 
vanish. 

25 . The associated effects on the storage requirement are very poor. In the present 

case where w = 2 = d, the set E would consist of the elements gi a 'g2 b with either 0 < a < 3 and 
-3 < b < 3, wherein a and/or b is uneven, or a - 0 and b = 1 or b = 3; accordingly, the set E 
would have the cardinality 20; this would make the storage requirement of the above first 
algorithm of the third example of embodiment too great. 

30 A similar consideration arises in respect of Solinas' "J[oint]S[parse]F[orm] - 

joint sparse representation" (cf. J. A. Solinas, "Low- Weight Binary Representations for Pairs 
of Integers", Centre for Applied Cryptographic Research, University of Waterloo, 
Combinatorics and Optimization Research Report CORR 2001-41, 2001, obtainable at 
httn ://www.cacr.math.uwaterloo ,ca/techreports/200 1 /corr200 1 -4 1 .ps ) : 
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The joint sparse representation recedes the two exponents at the same time 
and in a manner dependent on one another, The average density of the J[oint]S[parse]F[orm] 
is 1/2 and the number of group operations in the main loop of the above first algorithm from 
the third example of embodiment with w = 2 = di$ 3n/8 (as before, without including the 
precomputations and costs of almost-online recoding). 

The number of precomputed points is twelve, and this is much greater than the 
number eight in the variant proposed above, without the throughput of the algorithm being 
considerably improved with inputs from 160 bits to 256 bits. For a more detailed discussion 
and for corresponding evidence, reference may be made to Sections 3.3 and 4.4 of H. Cohen, 
"Analysis of the flexible window powering algorithm", advance copy obtainable at 
http : //www, math . u-b or de aux . fr/~cohen/ . 

Fourth example of embodiment: single scalar multiplication 

Single scalar multiplication in an additively written Abelian group G is 
obtained, in comparison to the above first example of embodiment (single exponentiation), 
by obvious replacements [<--> neutral element "0", "doubling", "sum" in scalar 
multiplication instead of neutral element "1", "squaring", "product" in exponentiation] and is 
shown below in the context of the fourth example of embodiment of almost-online recoding 
as an algorithm in which the following are entered 

a basic element g of the Abelian group G, 

an integer e having n bits, 

a window width w and 

a chunk or part width L » w; 
the (single) scalar multiplication e-g is output: 

Step 1. x<r-0 

Step 2. r*-\nlL\ then e = ^ e k 2 kL for 0 < e k < 2 L 
Step 3. for /c = r - 1 downto 0 do { 

(a) recode (e k )^e k = ]T> 0 b } 2 J 

(b) if b L * 0 then x <- x + b L g 

(c) for y = I-l downto 0 do { 
(i) x <- 2x 
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(ii) x <-x + bjg } } 
Step 4. return x 

Analogously to the first example of embodiment, it should be noted here that it 
may happen after L bits that the above algorithm carries out two group multiplications in a 
row instead of only one group multiplication. This happens if one of the chunks e t (= one of 
the exponent parts e t ) represents an uneven number and if the recoding of the following chunk 
( = of the following exponent part e i+ j) is one coefficient longer (bi not equal to zero). 

Fifth example of embodiment: multi-scalar multiplication 

The above algorithm from the fourth example of embodiment (single scalar 
multiplication) can be transformed into a multi-(scalar) multiplication method. Here, the 
multi-scalar multiplication is obtained in an additively written Abelian group G, in 
comparison to the above second example of embodiment (multi-exponentiation), by obvious 
replacements [<--> neutral element "0", "doubling", "sum" in multi-scalar multiplication 
instead of neutral element "1", "squaring", "product" in multi-exponentiation] and is shown 
below in the context of the fifth example of embodiment of almost-online recoding as an 
algorithm. 

If group elements gn gd s G and exponents en e* where d > 1 are given 
and ergi is to be computed, firstly a decision is made to use a sparse recoding of the 
exponents en &d\ use is then made of a "square-and-multiply" loop: 

Firstly, all the multiples c-gi are computed and stored, wherein c is a 
permissible positive coefficient. A temporary variable x is then set to 0 s G. For j = n-1, 
0, x is first doubled, and for i = 1, <ithe operand e^gi is added to the doubled x, wherein 
e t j is the coefficient of 2f in the recoding of e t . At the end, the temporary variable x contains 
the desired result. 

This method is also referred to as fast multiplication; as in the situation 
according to the fourth example of embodiment, it is once again desirable to retain the 
advantages of a good right-to-left recoding without having to use too much memory. 

The following variant carries out recoding "almost-online", that is to say 
almost during the fast multi-scalar multiplication or shortly after the fast multi-scalar 
multiplication, wherein the following are entered in the algorithm 
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basic elements gu gd of the Abelian group G, 
integers ej 9 &d id > 1) each having at most n bits, 
a window width w, 
a chunk or part width L»w and 

precomputed multiples cgt for all c in the set of coefficients; 
the multi-scalar product 2/=/ ergi is output: 



Step 1. x<^0 

Step 2. r <- \n I L\ then e l = ]T^ 3 0 e IJb 2^ 

for 0<e, ^ <2 L and? = l,...,d 

Step 3. for & = r-l downto 0 do { 

(a) for / = 1 to ^ do { 

recode (e iJk =Z>cA; 27 
if & ; . L * 0 then x x + } 

(b) for y = i«l downto 0 do { 

(i) x <— 2x 

(ii) for/ = ltorfdo{if 6 fjI * 0 thenx^-x + Z^- } 
Step 4. return x 

As a final part of the description, a list is given below of the numbers, 
elements, exponents, groups, indices, coefficients, sets, parameters, scalars, variables and 
digits mentioned in the present text: 
bij coefficient 

b i>L coefficient assigned to the highest power of two 2 L 
c permissible positive coefficient 
C finite set of integers 

d number of (basic or group) elements g t from the group G 

= number of exponents or scalars e f assigned to the (basic or group) elements g t 
e exponent, in particular integer exponent, in the case of single exponentiation or 

scalar, in particular integer scalar, in the case of single scalar multiplication 
e t exponent, in particular integer exponent, in the case of multi-exponentiation or 

scalar, in particular integer scalar, in the case of multi-scalar multiplication 

SUBSTITUTE SHEET (RULE 26) 



WO 2005/088440 PCT/IB2005/050614 

21 

e iM (exponent or scalar) chunk or (exponent or scalar) part following the (exponent or 

scalar) chunk or (exponent or scalar) part e it k 
e it k (exponent or scalar) chunk or (exponent or scalar) part of the divided exponent or 

scalar e% 

g (basic or group) element in the case of single exponentiation or in the case of single 
scalar multiplication 

gi (basic or group) element in the case of multi-exponentiation or in the case of 

multi-scalar multiplication 
G group, in particular Abelian group 
/ index 

j index, in particular summation index 
k variable, in particular indexed variable 

L (exponent or scalar) chunk width or (exponent or scalar) part width, in particular bit 
rate of the (exponent or scalar) chunk width or of the (exponent or scalar) part width 
n maximum bit rate or maximum bit length 

r number of (exponent or scalar) chunks or (exponent or scalar) parts e it u 

w parameter 

x temporary variable 
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